Systematic review of birth cohort studies in Africa.

Aim In sub-Saharan Africa, unacceptably high rates of mortality amongst women and children continue to persist. The emergence of research employing new genomic technologies is advancing knowledge on cause of disease. This review aims to identify birth cohort studies conducted in sub-Saharan Africa and to consider their suitability as a platform to support genetic epidemiological studies. Methods A systematic literature review was conducted to identify birth cohort studies in sub-Saharan Africa across the following databases: MEDLINE, EMBASE, AFRO and OpenSIGLE. A total of 8110 papers were retrieved. Application of inclusion/exclusion criteria retained only 189 papers, of which 71 met minimum quality criteria and were retained for full text analysis. Results The search revealed 28 birth cohorts: 14 of which collected biological data, 10 collected blood samples and only one study collected DNA for storage. These studies face many methodological challenges: notably, high rates of attrition and lack of funding for several rounds of study follow up. Population-based ‘biobanks’ have emerged as a major approach to harness genomic technologies in health research and yet the sub-Saharan African region still awaits large scale birth cohort biobanks collecting DNA and associated health and lifestyle data. Conclusion Investment in this field, together with related endeavours to foster and develop research capacity for these studies, may lead to an improved understanding of the determinants of intrauterine growth and development, birth outcomes such as prematurity and low birth weight, the links between maternal and infant health, survival of infectious diseases in the first years of life, and response to vaccines and antibiotic treatment.

In recent years, large 'biobank' studies have emerged as the most successful way of harnessing new genomic technologies, with the aim of providing resources for the future investigation of the separate and combined effects of genetic, environmental and lifestyle factors underlying multifaceted human diseases (10). The value of combining epidemiological and genomic data by using large scale cohort study design is growing in recognition and studies recruiting more than 500 000 participants are already established or underway. These studies, such as the UK Biobank (11), have focussed mainly on studying the major common diseases of public health importance, amongst adults in the developed world.
Many diseases of the poor, which represent the greatest health burden in terms of global mortality, have largely been neglected from this field of research. In 2008 there were an estimated 8.8 million child deaths; 5.97 million (68%) were caused by infectious diseases and nearly half (4.20 million) of the deaths occurred in Africa (12). Global child mortality has fallen since 1990, yet the targets outlined in the Millennium Development Goals to reduce this by two thirds before 2015, are not being met by many countries (13). In sub-Saharan African (SSA) countries, maternal complications of pregnancy and communicable diseases of women and children are still major public health concerns, with an unacceptably high burden of mortality (2,12,13). ited. For example, between African and European studies different distributions of genetic polymorphism have been found which determine genetic susceptibility to both communicable disease (eg, malaria, HIV, tuberculosis) and noncommunicable disease (eg, breast and prostate cancer) (15). Birth cohort studies conducted in high income countries fail to represent the conditions in poverty stricken areas of the world and there is a clear need for broader geographical representation in this area of research.
The aims of this review were 3-fold: 1. to provide a systematic review of birth cohort studies from SSA and discuss some important characteristics of these studies (population size, length of study, and follow up frequency); 2. to examine the methodology of and the data collected from birth cohort studies in SSA; 3. to offer recommendations on the feasibility and sustainability of support for this area of research based on the findings of this review and in the context of the existing literature.

Search strategy
After initial scoping exercises and input from a librarian to provide MeSH headings and keywords pertinent to this study, a systematic search was conducted across the following databases (  Table 1; 2) via the Global Health Library Regional Index: AFRO on 22 August 2010, using 'cohort' as a keyword; 3) search of grey literature: via OpenSIGLE on 22 August 2010, using 'birth' and 'cohort' as keywords.
An informal search of Google Scholar produced no additional results.
Reference lists of finally selected papers were hand searched for further studies.
The aim of this search was to identify all birth cohort studies and not necessarily all publications relating to each study.

Inclusion/exclusion criteria
We defined a birth cohort study as a study collecting data from a group of people born at a similar time, by active (medical examinations etc) and/or passive (hospital records, etc.) surveillance, with follow up over a variable period of time (months to decades) ( Table 2). Initial inclusion A study by Moran et al (14), at the Institute for International Health in Sydney, observed a marked discrepancy between funding for research and development relative to disease burden. This mismatch (with very low research investment relative to disease burden) was most notable for bacterial pneumonia and diarrhoea which account for 18% and 15% of global child deaths respectively, most of these death' s being concentrated in several large developing countries in SSA and South East Asia (12). In addition to this, the transferability of research findings on determinants of disease from high-income countries to SSA may be lim-criteria were sensitive but not specific, so that we could retrieve longitudinal studies which, whilst not necessarily meeting strict definitions of birth cohort studies, are useful in providing an overview of the characteristics of studies collecting longitudinal data in infancy/childhood in SSA. For more specific birth cohort analysis, strict criteria were applied to studies in a follow-up to the initial assessment. Studies that met the criteria were retained for full text analysis in order to focus further on quantitative and qualitative aspects of data collection.

Developing countries
1. exp Developing Countries/ 2. africa/ or africa, northern/ or algeria/ or egypt/ or libya/ or morocco/ or tunisia/ or "africa south of the sahara"/ or africa, central/ or cameroon/ or central african republic/ or chad/ or congo/ or "democratic republic of the congo"/ or equatorial guinea/ or gabon/ or africa, eastern/ or burundi/ or djibouti/ or eritrea/ or ethiopia/ or kenya/ or rwanda/ or somalia/ or sudan/ or tanzania/ or uganda/ or africa, southern/ or angola/ or botswana/ or lesotho/ or malawi/ or mozambique/ or namibia/ or south africa/ or swaziland/ or zambia/ or zimbabwe/ or africa, western/ or benin/ or burkina faso/ or cape verde/ or cote d'ivoire/ or gambia/ or ghana/ or guinea/ or guineabissau/ or liberia/ or mali/ or mauritania/ or niger/ or nigeria/ or senegal/ or sierra leone/ or togo/ or "antigua and barbuda"/ or cuba/ or dominica/ or dominican republic/ or grenada/ or guadeloupe/ or haiti/ or jamaica/ or "saint kitts and nevis"/ or saint lucia/ or "saint vincent and the grenadines"/ or central america/ or belize/ or costa rica/ or el salvador/ or guatemala/ or honduras/ or nicaragua/ or panama/ or panama canal zone/ or mexico/ or argentina/ or bolivia/ or brazil/ or chile/ or colombia/ or ecuador/ or guyana/ or paraguay/ or peru/ or suriname/ or uruguay/ or venezuela/ or asia, central/ or kazakhstan/ or kyrgyzstan/ or tajikistan/ or turkmenistan/ or uzbekistan/ or cambodia/ or east timor/ or indonesia/ or laos/ or malaysia/ or myanmar/ or philippines/ or thailand/ or vietnam/ or asia, western/ or bangladesh/ or bhutan/ or india/ or sikkim/ or afghanistan/ or iran/ or iraq/ or jordan/ or lebanon/ or syria/ or turkey/ or yemen/ or nepal/ or pakistan/ or sri lanka/ or exp china/ or korea/ or "democratic people' s republic of korea"/ or "republic of korea"/ or mongolia/ or albania/ or lithuania/ or bosnia-herzegovina/ or bulgaria/ or byelarus/ or "macedonia (republic)"/ or moldova/ or montenegro/ or romania/ or russia/ or serbia/ or ukraine/ or yugoslavia/ or exp transcaucasia/ or armenia/ or azerbaijan/ or "georgia (republic)"/ or comoros/ or madagascar/ or mauritius/ or seychelles/ or fiji/ or papua new guinea/ or vanuatu/ or palau/ or samoa/ or tonga/ 10. low income countr*.tw. 11. middle income countr*.tw. 12. (low adj2 middle income countr*).tw. *Within each box all terms were combined with Boolean operator OR. Developing countries were those defined by the World Bank list of economies (July 2010) as low-or middle-income.

Data extraction
Studies included in Stage 1 were extracted to an Excel file and analysed by abstract alone. This exercise aimed to provide background perspective of longitudinal studies in SSA rather than provide comprehensive data extraction. Studies included for full paper review (Stage 2) were assessed in full and the data obtained was intended to be comprehensive.
Both qualitative and quantitative data were assessed. Studies were assessed by categories that included, but were not limited to, biological samples (such as blood, DNA and urine); anthropometric data; cognitive, psychological and other developmental indicators; socioeconomic data; methodological challenges; and attrition.

rESULTS
Abstract analysis of 189 papers produced data from 124 longitudinal cohort studies ( Table 3 and Figure 3) meeting Stage 1 criteria. Abstracts were analysed for basic study characteristics ( Table 3).

Study characteristics -Stage 1
Study size: The study sizes ranged from 30 to 11,342 participants. 79 of the studies recruited 1000 or less participants and only 27 studies recruited more than 1000 at induction. 59 recruited less than the quality criteria (500 participants) suggested in this study.
Follow up: 53 studies began data collection either in the antenatal period or at birth. The follow up period of participants ranged from 3 months to 20 years. The 'Birth To Twenty' study in South Africa is the longest running, has followed the initial cohort for 20 years and is still ongoing. The majority of the studies (n = 63), however, followed the cohort for 2 years or less.
Frequency: Frequency of follow up ranged from daily to two-yearly. Most abstracts did not report data on follow up frequency and full text analysis was needed.
Methods: 48 studies were identified that followed a specific population. The large majority of studies identified for setting were conducted in the community.

Study characteristics -Stage 2
The study characteristics of the full paper reviews are found in Table 4. Analysis of 71 full text publications produced data for 28 separate birth cohort studies. All further results relate to these 28 studies.
Study size: Of the 28 studies retained for full paper analysis, the median number of participants at induction is 1272, ranging from 571 to 11 342.
Follow up: Median follow up was 24 months, ranging from 12 to 216 months. Median frequency of follow up was bimonthly, ranging from bi-weekly to five annualy.
Age at induction: The mean age of study participants at induction was 17 months.

Biological measurements
Blood: Ten of the studies collected blood from study participants ( Table 5). A number of these ten studies also took other blood samples, including: maternal blood (n = 8), cord blood (n = 5), placental blood (n = 3) and blood smears

Other biological samples
Very few other biological samples were taken by studies.
Thirteen of the 28 studies collected no biological data at all.
Anthropometry: 18 studies detailed methods of anthropometric measurements. The World Health Organisation/ National Centre for Health Statistics were the most commonly used growth references.
Other data: Only one study measured psychological variables. Measurement of nutrition, cognitive development, socio-economic status and psychological variables most frequently used questionnaire based data collection.

Loss to follow up
Attrition in SSA birth cohorts, measured from induction to last follow up, has a median of 28% ranging from 0% to 72.9%. Attrition rates were as follows: 0 to 10% attrition in 4 studies, 11-20% in 1 study, 21-30% in 4 studies, 31-40% in 3 studies and more than 40% in 4 studies. Twelve studies did not clearly report loss to follow up.
Problems reported by studies leading to high rates of attrition included: high infant mortality rates, family relocation, refusal to participate, maternal death, wave attrition (not present for one follow up, but return for the next) and failure of researchers to retrace study individuals.
Efforts to promote successful follow up included: providing participants with incentives -eg, 'Birth To Twenty' provided participants with basic mobile phones to aid contact; the Asembo bay cohort study provided free health clinics for all participants and a gift pack containing baby care items was given to mothers (this reduced attrition from 18.9% in 1992 to 7.7% in 1994). The Asembo Bay study also provided free medications, immunizations, and transportation to and from the study clinic, which contributed to a rate of loss to follow-up of only 4 percent over the first 18 months of follow up.  Other factors contributing to successful follow up included modern communication systems, close and frequent contact with all study sites for standardisation, short periods between phases of data collection and detailed contact information stored in computerised systems. These measures, however, were not appropriate or feasible for the majority of the studies reviewed.

DISCUSSION
This review examined birth cohort studies in SSA -their methodologies and the challenges they face -in light of the absence of biobank studies in this region and with a view to making recommendations about future plans for this area of research. The study clearly shows that, although a number of efforts do exist, they evolved in their local settings and they weren't planned or organized in any systematic way. Also, they very rarely (if ever) collect and store DNA material and frozen plasma for further genetic and biochemical studies.

Study limitations
Attempts to overcome publication bias were made by searching for unpublished literature via OpenSIGLE (System for Information on Grey Literature in Europe) an open access source of bibliographical references of reports and other grey literature produced in Europe until 2005. It is, however, possible that some sources of grey literature were missed. Only two papers were retrieved in French, and none in Portuguese, despite the large Francophone and Lusophone populations in Africa. Although no language restrictions were applied, it may suggest the design of the search criteria was not optimised to retrieving these studies.
Due to the longitudinal nature of cohort studies large numbers of papers are produced, over long periods of time,

Systematic review of birth cohort studies in Africa
from multiple rounds of study follow up. It is, as a result, challenging to capture the most updated description of an individual study. Efforts were made to group papers by study, both by hand (matching authors, years and sample sizes) and citation mapping, in order to identify the most recent description of any particular study and to capture all the data across the spread of publications arising from a single cohort. It is, however, possible that the current status of a study was not identified. Study characteristics are often reported differently between publications as cohort studies continue to develop over time in their methods and study sub-populations. It is possible that the conclusions of this study about the lack of good biological data collection reflect unreported data or failure to capture data, rather than a genuine lack of samples. Writing to authors may have helped to indentify the most comprehensive and current descriptions of birth cohort studies.

Study quality
Comparability across birth cohort reviews is complicated by the different definitions of birth cohort studies. The definition is generally recognised as the longitudinal follow up of a group of people born at, or around, the same time. The descriptive elements considered important in this review include the study size at enrolment, the age of study participants at enrolment, the duration of follow up and the frequency of these follow ups. This review identified a subgroup of twenty eight studies that met strict criteria (minimum enrolment of 500 or more study participants with at least 12 months of follow up and enrolment at less than 60 months of age) with a view to identifying studies suitable for biobank data collection for the purposes of studying important causes of child health and development.
This review demonstrated, first, that the majority of birth cohort studies researched a specific sub-population rather than representative population samples eg, assessing HIV related outcomes in offspring of HIV positive mothers. Second, the SSA birth cohorts are largely restricted to small sample sizes with short periods of active follow up, with sample sizes varying from 571 to 11 342. Only two birth cohort studies were identified with study sizes greater than 5000. The 'Birth to Twenty' study in South Africa began following an unselected population of 3275 mothers and their children in 1990 and is still ongoing today. It is the best example of a large scale, long term birth cohort in SSA. However, it is of relatively small scale when compared to similar birth cohort studies in developed countries such as Avon Longitudinal study of Parents and Children (AL-SPAC) which studied 13 971 births at induction (2). Typical genome wide association study sizes comprise several thousand, or even tens of thousands, of individuals. Recommended sample sizes for gene-environment interaction studies are of the order of twenty thousand participants with a specific outcome of interest (16). Small sample siz-es of the African studies identified in this review are unlikely to provide a secure basis upon which to study the genetic and environmental influences of health and development outcomes in pregnancy and early childhood.
Second, the large majority of studies followed the cohort for less than two years, an insufficient amount of time on which to draw conclusions relating to the long term influence of developmental factors on future child, adolescent and adult health.
Third, the large majority of studies followed the cohort for less than two years, an insufficient amount of time on which to draw conclusions relating to the long term influence of developmental factors on future child, adolescent and adult health.
Finally, there is no single coordinated definition of a birth cohort and measurements vary greatly between different studies. It is, therefore, difficult to combine study results or conduct meta-analyses of data due to inherent differences in birth cohort study design and measurement methods. Efforts have been made to bring together data from the largest low and middle income country birth cohorts as part of the work of the Consortium of Health Orientated Research in Transitioning Societies (COHORTS), but this initiative does not include the creation of a large biological resource (17). The five largest prospective birth cohorts with sample sizes of 2000 or more newborns and at least 15 years of follow up were included in this initiative and the data sets of these studies pooled. Some of the challenges identified by the COHORTS group included differences in variable definitions and measurement techniques; different ages for which data are available; and different time periods captured by each study. These differences between the studies resulted in restriction of their analyses to only those variables which were collected consistently across the cohorts. These limitations also apply to the SSA studies identified in this review.

Biological data
Human genomic studies have revolutionized our understanding of disease and rapid progress has been made in high income countries with completion of the human genome project, emergence of genome wide association studies and the prospect of whole genome sequencing and pharmacogenomics (18,19).
Africa, where all human populations originated, is the most genetically diverse region in the world. To date, the relative risks (or odds ratios) for complex diseases associated with genetic loci -studied mainly in high-income countrieshave been small (1.5 or less) (20). People of African origin display shorter linkage disequilibrium (LD) blocks, allowing for more precise mapping of loci associated with disease risk and the potential to discover disease causing variants which may previously have been masked by large LD blocks in European populations (21,22). Genetic factors do not account for chronic disease susceptibility alone, rather they interact with environmental exposures to determine disease risk (23). Africa' s genetic diversity, combined with its environmental diversity, unique life exposures and natural selection pressures presents many exciting possibilities for genetic research.
This review serves to highlight the lack of systematically collected birth cohort data on genetic, environmental and lifestyle factors underlying child health and development problems of SSA. Only one of the birth cohorts identified took DNA samples to establish a DNA bank, and this contained only 2 200 individuals. The majority of DNA samples taken by studies were one-off measurements for diagnostic assessment of HIV or malaria status. The 473 GWA articles contained in the National Human Genome Research Institute catalogue were assigned weight according to country of origin in a study by Rosenburg et al. (24). These comparative weightings showed that the contribution of sub-Saharan African countries to genome wide association studies, even when all SSA country inputs are combined (0.34), is insignificant compared to those of high income countries (eg, 205.5 in USA, 68.15 in UK and 37.02 in Germany).
Overall, 13 of the 28 studies collected no biological samples of any sort reflecting either a primary interest in alternative data collection, or a simple lack of resources, manpower and laboratory facilities to do so.
The same technologies which are being used in developed world biobanks have the potential to generate new knowledge about communicable diseases amongst mothers and children in Africa. However, a gaping divide exists in clinical and genomic research capacity between SSA and higher income countries (21). DNA based studies require stringent quality criteria for complex processing and storage of samples, access to laboratories which are equipped with state-of-the-art facilities and run by well trained staff (22). The complexity of undertaking these studies could, however, foster local capacity building and drive innovation for new research opportunities and development in SSA.
Genome-based studies in developing countries present important ethical considerations (25). Valid consent must be obtained in a way that ensures an informed and voluntary choice can be made by study participants, regardless of their level of education and literacy. Protecting the privacy of study participants is an essential consideration as GWA studies have the potential to reveal stigmatising information about an individual or population which may be used for harm (26). Due to the lack of large scale genotyping facilities in most sub-Saharan African countries, samples may require storage and export for processing in high income countries (27). It is essential that a balance is struck between protection of study participant' s privacy and the need for data sharing and release in research (28). Strict guidelines on sample handling and destruction are often required limiting the ability for secondary analysis and reuse of archived samples (29). Obtaining ethical approval for genomic research in developing countries is understandably a complex and challenging process. However, these challenges can be successfully met as the experience in malaria research has shown (25).

Challenges of longitudinal studies in low-income countries
Longitudinal studies pose unique methodological challenges to researchers. In birth cohorts two types of sample loss are reported: initial non-enrolment and attrition on follow up. Both have the potential to cause systematic bias in collection and interpretation of results. In the developing world, failure to trace individuals is reported as the most common cause of attrition (30). Lack of infrastructure, administrative centres, national databases and aids such as widespread patient identifiers in SSA pose a challenge to data collectors. High rates of migration also pose a challenge to longitudinal studies especially as the more educated, urban section of a cohort may be more likely to migrate, potentially resulting in a sample no longer representative of the original population from which it was taken.
Efforts to overcome attrition have included providing participants with incentives to continue with the study, however, there is a risk of subsequently conditioning the cohort such that they are no longer representative of the normal population. Other studies have used national census information, army enlistment days and systematic searching of all homes in the study area to retrace study participants.
Despite the methodological challenges faced by longitudinal cohorts in the developing world they are achievable and studies such as the Pelotas birth cohort in Brazil are testament to this (31). The study which began in 1982, measuring over 4000 variables for 5914 study participants, is one of the largest and longest running birth cohorts in the developing world and is still ongoing today. Household sampling, army enlistment and the low emigration rate in Pelotas limited attrition and follow up in 2005 retraced 77% of the original cohort.

Funding
Most birth cohort studies report difficulty in attracting funding for initiating studies and then supporting multiple rounds of follow up. There is a particular need for multiple sources of funding if birth cohort studies are to collect biological samples for biobank data. The UK Biobank, a cohort of 500 000 people with a baseline assessment and 8 year

Systematic review of birth cohort studies in Africa
follow up is projected to cost US$ 104 million (€ 72.5 million) (32). The UK Biobank was funded by Wellcome Trust (the UK' s largest independent medical research charity), the Medical Research Council, the Department of Health, the Scottish Government, British Heart Foundation, the Northwest Regional Development Agency and others.
One study estimated that the cost of setting up a ten year study similar to the UK Biobank in SSA with additional exposure measurements, intervention trials and research capacity building -would cost anywhere between, US$ 23.7 million (€ 16.5 million) for a cohort of 150 000 people across three countries or US$ 2.56 billion (€ 1.8 billion) for a cohort of 400 000 people across four countries (33

CONCLUSION
This review identified a larger number of relevant studies from 28 sites in Africa . Only one birth cohort study which systematically collected DNA samples and related health data was identified but this was of a small scale. Investment in research training, infrastructure and pilot studies alongside the creation of ethical frameworks, quality assessment and locating long term sources of funding are just a few of the initial challenges that need to be addressed to establish and then ensure the sustainability of such biobanks in SSA. Governments and not-for-profit agencies have made large investments towards funding biobanks in high-income countries. We suggest that it is now time they turned their resources towards investing in the research capacity of SSA, and in doing so, investing in the future of mothers and children upon whom a large burden of avoidable mortality is centred.
Ethical approval: Not required.
Authorship declaration: Both authors designed and conducted the study and wrote the paper.
Competing interests: All authors have completed the Unified Competing Interest form at "http://www.icmje.org/coi_disclosure.pdf " www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: no support from any organization for the submitted work; no financial relationships with any organizations that might have an interest in the submitted work in the previous 3 years; no other relationships or activities that could appear to have influenced the submitted work.